Descriptive Schema: Semantics-based Query Answering
نویسندگان
چکیده
We propose the novel concept of “descriptive schema” (DS). Unlike ordinary database schemas, a DS does not restrict the structure of the underlying database. Rather, it is just a probabilistic description of the structure. When answering keyword queries, DS can be used to improve semantics-based query answering and result ranking. 1 Schema: To have or not to have? Wikipedia is a rich repository of information. However, facilities to exploit the information are still limited. Although typical search WWW search engines such as Google[1] allow users to look for information using keywords, they lack a schema for formulating the queries precisely. Besides hyperlinks among the Wikipedia pages, many pages have Category tags as well as Infoboxes, which can be exploited to perform more sophisticated searches. For example, the DBpedia community makes use of these tags to build a database of RDF triplets, allowing more expressive and precise queries in the form of SPARQL to be used to retrieve useful information [2]. The above are two extremes of search and query. In the former case, the user can perform a search easily using relevant keywords, without having to learn the schema’s lexicon beforehand. In the latter case, a schema can be used to help specify the query more precisely, but it has a non-trivial learning curve. In this paper, we propose the approach of “descriptive schema” to address these shortcomings. We attempt to strike a balance between the ease of use of a schema-less approach and the high accuracy that a schema-based system can bring us. 2 Descriptive Schema In this paper, we propose a new concept called “Descriptive Schema” (DS). Unlike XSD (XML Schema Definition), DS is not meant to prescriptively mandate a structure on the underlying data. We want to retain the flexibility of free format for the pages. Rather, DS, as its name implies, is descriptive. It is only a summary of the structure exhibited by the underlying database. It does not define the structure. The data may occasionally violate the DS. This tolerance to violations marks our biggest innovation, contrasting with existing approaches. Existing approaches to data modelling use “Prescriptive Schema”, which mandates a rigid structure on the underlying data, with little (if any) tolerance to violations. We model a DS by a set of rules on the underlying data. There are many possible ways to formulate the rules. One example rule is: “90% of the time, a page of class ‘Countries’ has value for the field ‘capital’ in the infobox (infobox for countries)”. Note that the rules defined in this way are probabilistic, because they are not satisfied all the time. A DS may thus be considered a summary of the patterns occurring in a database, instead of policies imposed on the data. The task of discovering a DS from a database is a mining task, which is the problem of finding all rules satisfying a the specified syntax and support thresholds, thus following the data mining model in [3].
منابع مشابه
Optimizing Reformulation-based Query Answering in RDF
Reformulation-based query answering is a query processing technique aiming at answering queries under constraints. It consists of reformulating the query based on the constraints, so that evaluating the reformulated query directly against the data (i.e., without considering any more the constraints) produces the correct answer set. In this paper, we consider optimizing reformulation-based query...
متن کاملInconsistency-tolerant query answering in ontology-based data access
Ontology-based data access (OBDA) is receiving great attention as a new paradigm for managing information systems through semantic technologies. According to this paradigm, a Description Logic ontology provides an abstract and formal representation of the domain of interest to the information system, and is used as a sophisticated schema for accessing the data and formulating queries over them....
متن کاملData exchange: query answering for incomplete data sources
Data exchange is the problem of transforming data structured under a schema, called the source schema, into data structured under another schema, called the target schema. Existing work on data exchange considers settings where the source instance does not contain incomplete information. In this paper we study semantics and address algorithmic issues for data exchange settings where the source ...
متن کاملNew Inconsistency-Tolerant Semantics for Robust Ontology-Based Data Access
In ontology-based data access (OBDA) [17], an ontology provides an abstract and formal representation of the domain of interest, which is used as a virtual schema when formulating queries over the data. Current research in OBDA mostly focuses on ontology specification languages for which conjunctive query answering is first-order (FO) rewritable. In a nutshell, FO-rewritability means that query...
متن کاملAnswering SPARQL queries modulo RDF Schema with paths
SPARQL is the standard query language for RDF graphs. In its strict instantiation, it only offers querying according to the RDF semantics and would thus ignore the semantics of data expressed with respect to (RDF) schemas or (OWL) ontologies. Several extensions to SPARQL have been proposed to query RDF data modulo RDFS, i.e., interpreting the query with RDFS semantics and/or considering externa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008